Internatiooal Conference on Parallel Processing Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates

نویسنده

Monica S. Lam

چکیده

Shared-memory multiprocessors with hardware coherent caches [1,4,6,7,10] are attractive in IIlat they can be programmed relatively easily and that they allow the program to take advantage of the caching of shared data. Recent studies have shown that shared clara may exhibit a high cache miss rate, especially if the parallelism is fme-grained [3,9). In large multiprocessors with considerable memory access latencies, a high cache miss rate may lead to poor machine performance. While optimizations that change the parallel algorithm or the data struc tures can greally modify the cache miss behavior, these optimizations often ~uire application knOWledge and user intervention. Here we focus only on the optimization of repositioning shared data at the cache block level. "The changes are all transparent to the programmer. This approaCh is motivated by the fact IIlat cache misses on shared data are often concentrated in small sections of the data space. Therefore, localized optimizations can potentially generate most of the desired effects. Figure I shows the cache misses on the shared data structures for one application. The average number of misses per byte for each data structure is computed by dividing the total misses on the data structure by the size of the data structure. The leftmost peak corresponds to an area of scalar variables and some small arrays.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Cache Profiling of Parallel Processing Programs Using Simics

This paper presents a multi-cache profiler for shared memory multiprocessor systems. For each program’s static data structure, the profiler outputs the readand write-miss frequencies that are due to cache line migrations. Those program’s static data structures, which their manipulations, result in excessive cache line migrations—potentially a source for excessive falsemisses—are identified. The...

متن کامل

Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor

For a parallel architecture to scale eeectively, communication latency between processors must be avoided. We have found that the source of a large number of avoidable cache misses is the use of hardwired write-invalidate coherency protocols, which often exhibit high cache miss rates due to excessive invalidations and subsequent reloading of shared data. In the Avalanche project at the Universi...

متن کامل

Using supplier locality in power-aware interconnects and caches in chip multiprocessors

Conventional snoopy-based chip multiprocessors take an aggressive approach broadcasting snoop requests to all nodes. In addition each node checks all received requests. This approach reduces the latency of cache to cache transfer misses at the expense of increasing power. In this paper we show that a large portion of interconnect/cache transactions are redundant as many snoop requests miss in t...

متن کامل

System-level Optimizations for Memory Access in the Execution Migration Machine (EM2)

In this paper, we describe system-level optimizations for the Execution Migration Machine (EM2), a novel shared-memory architecture to address the memory wall and scalability issues for large-scale multicores. In EM2, data is never replicated and threads always migrate to the core where data is statically stored. This enables EM2 not only to provide cache coherence without any complex protocols...

متن کامل

False Sharing and Spatial Locality in Multiprocessor Caches

The performance of the data cache in shared-memory multiprocessors has been shown to be diierent from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can signiicantly limit the performance of multiprocessors....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Internatiooal Conference on Parallel Processing Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates

نویسنده

چکیده

منابع مشابه

Multi-Cache Profiling of Parallel Processing Programs Using Simics

Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor

Using supplier locality in power-aware interconnects and caches in chip multiprocessors

System-level Optimizations for Memory Access in the Execution Migration Machine (EM2)

False Sharing and Spatial Locality in Multiprocessor Caches

عنوان ژورنال:

اشتراک گذاری